Cache memory

ABSTRACT

A cache memory comprises a first set of storage locations for holding syllables and addressable by a first group of addresses; a second set of storage locations for holding syllables and addressable by a second group of addresses; addressing circuitry operable to provide in each addressing cycle a pair of addresses comprising one from the first group and one from the second group, thereby accessing a plurality of syllables from each set of storage locations; and selection circuitry operable to select from said plurality of syllables to output to a processor lane based on whether a required syllable is addressable by an address in the first or second group.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is related to European Patent Application No.07252666.8, filed Jul. 2, 2007, entitled “CACHE MEMORY”. European PatentApplication No. 07252666.8 is assigned to the assignee of the presentapplication and is hereby incorporated by reference into the presentdisclosure as if fully set forth herein. The present application herebyclaims priority under 35 U.S.C. §119(a) to European Patent ApplicationNo. 07252666.8.

TECHNICAL FIELD

This invention relates broadly to computer system architectures andparticularly to a cache memory and to a processor using a cache memory.

BACKGROUND

Processors are known which execute very long instruction word (VLIW)instructions. Generally, VLIW instructions are variable length and arecomposed of syllables. Each instruction is termed a bundle, and in someexamples a bundle can consist of one, two, three or four syllables. AVLIW processor executes VLIW instructions (bundles) on every cycle whereit is not stalled. FIG. 1 illustrates a Prior Art layout in a memory 2of such instructions. It will be appreciated that FIG. 1 shows only avery small part of a memory 2 and in particular shows only the rowswithin one sector of the memory. In this document, “rows” are used todefine a region of memory relating to the issue width of the processor.In the examples discussed herein a row is a 128 bit aligned region ofmemory. In a VLIW memory as exemplified herein, bundles are aligned to32 bit boundaries in memory. Therefore, a maximum width (128 bit) bundlehas only a one in four chance of being aligned to a 128 bit boundary. Inmost cases it will not be 128 bit aligned. FIG. 1 shows the case wherean instruction I₁ is 128 bit aligned beginning at row address addri anda situation where another instruction, I₂ is misaligned, commencing at arow address addr[j+64] and with its last syllable at address addr[k+64].In this case, the address addrj would represent the 128 bit alignedaddress for the memory 2.

In order therefore to allow bundles to be assembled, existinginstruction caches for use with such a memory 2 are constructed to allowfour syllable reads to be made from arbitrary 32 bit aligned addresses.A direct mapped cache permitting this is shown in FIG. 2. FIG. 2illustrates a Prior Art cache 4 having four banks B0, B1, B2, B3. Inthis example, each bank has a capacity of 8 kilobytes and is 32 bitswide. The cache 4 is connected to an execution unit 6 which comprises aplurality of execution pipelines or lanes L0, L1, L2, L3. Each laneaccepts a 32 bit wide syllable from the respective bank of the cache.

In order to allow for non-aligned addresses, each bank comprises anindividually addressable RAM. The RAMs are connected to addresscircuitry 8 via respective address lines ADDR1 . . . ADDR4. Each RAM isaddressed by supplying a row address along the respective address line.In the case of instruction I₁, it can be seen that the row address forrow i can be fed to each of the banks. However, for instruction I₂,banks B0 and B1 need to be addressed by the address for row j, whereasbanks B2 and B3 need to be addressed by the address for row k. This isshown in more detail in FIG. 3 which illustrates where the syllables (S1₁ . . . S4 ₂) of instructions I₁ and instructions I₂ are stored in thecache 4. As is well known caches are arranged in lines. In a directmapped cache, each bank has a plurality of addressable locations, eachlocation constituting one cache line. When a cache miss happens a fullline is fetched from memory. In principle a line can be any number ofbytes, but in the examples discussed herein the cache line length is 64bytes. Each line has a tag stored with it which states where in memorythe cache line came from—this stores the upper bits of the address. Thelower bits of the address are used to index the cache (that is to lookup the line within the cache) to access the data. Thus, each linecontains syllables from main memory and the cache tag (upper bits of theaddress). Each line is addressable by a cache line address which isconstituted by a number of bits representing the least significant bitsof the address in main memory of the syllables stored at that row. Itwill be appreciated therefore that where there is reference here to arowaddress, this is identified by a number of least significant bits ofthe address in main memory. Thus, there may be a number of rows in thememory sharing those least significant bits which would map onto anyparticular line of the cache. In FIG. 1, one sector of the memory isshown which has one row address addri mapping onto line i of the cache.

Direct mapped caches, while simple to construct, have seriousperformance limitations in some contexts because instructions can onlybe written into a line of the cache to which it legitimately maps. Thismay mean that there are unused parts of the cache, while other parts ofthe cache are being constantly overwritten.

SUMMARY OF THE INVENTION

The deficiencies of the prior art as described above are addressed bythe teachings of the present patent document. Broadly, one exemplarycache memory may include a plurality of storage locations, addressingcircuitry and selecting circuitry. First and second sets of storagelocations operate to hold syllables and are respectively addressable bya first group and a second group of addresses. The addressing circuitryoperates to provide in each addressing cycle a pair of addressescomprising one from the first group and one from the second groupthereby accessing a plurality of syllables from each of storagelocations. The selection circuitry operates to select from a pluralityof syllables to output to a processor lane based on whether a requiredsyllable is addressable by an address in one of the first and secondgroups.

FIG. 4 illustrates one exemplary option for a four-way set associativecache which has greater flexibility than the direct mapped cachediscussed above. The cache in FIG. 4 has four ways, WAY0, WAY1, WAY2,WAY3, each way comprising four banks providing a total of sixteen banks.Each bank has a capacity in this example of two kilobytes, but it willbe appreciated that any suitable capacity could be adopted. Each bankhas a width of 32 bits. The banks of each way are grouped in the sensethat bank0, bank4, bank8, bank12 feed one lane, in this case lane 0, ofthe execution unit 6. There is a similar correspondence for the otherbanks and lanes. The ways can in principle be commonly addressedrow-wise by a common row address. However, the multiplicity of bankswithin each way has to be taken into account, which in effect means thataddress circuitry 8 supplies four address lines, each allowing anaddress to be supplied to a particular bank of a particular way.Considering the grouped banks bank0, bank4, bank8, bank12, these will beaddressed by a common row address supplied to the respective way. Thatrow contains four cache entries, one for each bank of the grouped banks,bank0, bank4, bank8, bank12. Thus, this allows four cache entries forone row address. Each time a row is addressed therefore, four cacheentries are output. A cache access circuit 10 receives the outputs andfurther receives the address which was supplied from the addresscircuitry 8. By making a comparison between the supplied address and thecache tag in the cache, the cache access circuit 10 determines which wayholds the correct data and uses a control signal 12 to control amultiplexer 14 ₀ to select the correct data to supply to lane 0. Outputsfrom the remaining grouped banks have similar selection mechanisms withtheir outputs being connected to the cache access circuit 10, althoughfor reasons of clarity the connections are not shown in full in FIG. 4.The multiplexers are referenced 14 ₁ for lane 1, 14 ₂ for lane 2 and 14₃ for lane 3.

A set associative cache provides a much more flexible use of the cachememory than a direct mapped cache, however the addressing structure ismore complex.

It is an aim of the present invention to provide a cache memory whichallows the flexibility of a set associative cache but with a simplifiedaddressing structure.

According to an aspect of the present invention there is provided acache memory comprising: a first set of storage locations for holdingsyllables and addressable by a first group of addresses; a second set ofstorage locations for holding syllables and addressable by a secondgroup of addresses; addressing circuitry for providing in eachaddressing cycle a pair of addresses comprising one from the first groupand one from the second group, thereby accessing a plurality ofsyllables from each set of storage locations; and means for selectingfrom said plurality of syllables to output to a processor lane based onwhether a required syllable is addressable by an address in the first orsecond group.

Another aspect of the invention provides a cache memory comprising aplurality of ways, each way comprising: a first set of storage locationsfor holding syllables and a cache tag where said syllables areaddressable by a first group of addresses; a second set of storagelocations for holding syllables and a cache tag in main memory wheresaid syllables are addressable by a second group of addresses;addressing circuitry for providing in each addressing cycle a pair ofaddresses comprising one from the first group and one from the secondgroup, thereby accessing a plurality of syllables from each set ofstorage locations; and means for selecting from said plurality ofsyllables to output based on whether a required syllable is addressableby an address in the first or second group; the cache memory furthercomprising switching means for selecting from said outputted syllables,syllables associated with one of said ways, based on comparing at leastpart of the addresses provided by the addressing circuitry with thecache tags held in the storage locations.

In the following described embodiment, the first group of addresses areodd addresses, and the second group of addresses are even addresses, butit will be appreciated that other groupings are possible.

Other technical features may be readily apparent to one skilled in theart from the following figures, descriptions and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of this disclosure and its features,reference is now made to the following description, taken in conjunctionwith the accompanying drawings, in which:

FIG. 1 illustrates a prior art layout in a memory of VLIW syllables;

FIG. 2 is a prior art schematic block diagram of a direct mapped cache;

FIG. 3 illustrates a prior art storage of data in a direct mapped cache;

FIG. 4 is a schematic block diagram of a four way set associative cache;

FIG. 5 is a schematic block diagram of one embodiment of the invention;

FIG. 6 is a schematic block diagram of VLIW syllables stored in amemory;

FIG. 7 is a schematic block diagram of an embodiment of the inventionincluding an instruction buffer;

FIG. 8 is a schematic block diagram of a cache for supplying an eightlane execution unit;

FIG. 9 is a schematic block diagram of a video/audio decode unit for aset top box; and

FIG. 10 is a schematic block diagram of a handheld personal computerwith a mobile phone capability.

DETAILED DESCRIPTION

FIG. 5 illustrates an embodiment of the present invention in the form ofa cache memory structure which achieves the benefits of a setassociative cache but with reduced addressing complexity. In contrast tothe cache structure of FIG. 4, each way of the cache comprises twobanks, an even bank and an odd bank. As compared with the direct mappedarrangement of FIG. 2, the number of RAM banks is halved, thereforereducing area overhead. Each bank has a capacity of four kilobytes inthis embodiment, but it will be appreciated that any suitable capacitycould be utilised. The width of each bank is 128 bits. The banks arelabelled according to the following protocol where B denotes bank, thenumerical suffix denotes the way and the lower case letter suffixdenotes whether it is even or odd. For example B1 _(e) denotes the evenbank of way 1. For reasons of clarity not all of the denotations areshown in FIG. 5. In the embodiment of FIG. 5, address circuitry 18provides two addresses along respective address paths labelled Addrn andAddrn+128. As will be understood from the preceding discussion, whenfetching from address n, if n is 128 bit aligned, then n represents therow address for all syllables of a syllable fetch. If n is not 128 bitaligned, then syllables of single fetch can lie either in row n or inthe subsequent row, that is the row addressed by n+128. Consequently,the address lines from address circuitry 18 supply addresses for rowpairs, where each pair is represented by address n and address n+128.The addresses of even numbered rows are supplied to the even banksB_(e), whereas the addresses of odd numbered rows are supplied to theodd banks B_(o).

Each bank is constituted by an addressable RAM. Thus, each way comprisesone even RAM and one odd RAM. Four multiplexers 24 a, 24 b, 24 c, 24 dare provided for each way because the syllable for each lane can comefrom either the odd RAM or the even RAM. The multiplexers 24 a . . . 24d are controlled by respective signals 26 a to 26 d from an odd/evenselection circuit 28 which takes information from the address circuitry18 about whether the address is odd or even. The four selected syllablesfrom each way are supplied to the lane multiplexers 14 ₀ . . . 14 ₃ viaa path 30. A cache access circuit 32 selects the correct way to providethe syllables to the lanes based on the full address from the addresscircuitry 18 in a manner similar to that described above with referenceto FIG. 4. Bundling logic 33 organises the syllables back into bundlesfor applying to the lanes.

FIG. 6 illustrates the organisation in one sector of the main memory 2to explain how the cache memory structure of FIG. 5 operates. It isworth noting here that, in the event of a cache miss, a fetch is madefrom main memory of enough bytes to fill a cache line, 64 bytes in thisexample. This is dependent on the cache architecture and not directly ofrelevance to the present invention apart from realising that thecapacity of a cache line has an impact on fetching as discussed later.

Reverting to FIG. 6, instruction I₁ comprises syllables S1 ₁, S2 ₁, S3 ₁and S4 ₁ crosses a 128 bit boundary. It starts at address j+64 and withthe last syllable at address k+32, where j and k denote the rows. In theexample of FIG. 6, j is even (e.g. row 6) and k is odd (e.g. row 7).Thus, to recall all syllables of instruction I₁ both rows j and k needto be addressed. This is achieved in the embodiment of FIG. 5 by issuingaddresses j and j+128 on address paths ADDn and ADDn+128. Address jaddresses the even banks, while address j+128 addresses the odd banks.In this case, the multiplexers 24 a and 24 b are set to “even”, whilemultiplexers 24 c and 24 d are set to “odd”. This is achieved by mappingfrom the two least significant bits of the address as follows:

[addrn[1:0] bank0 bank1 bank2 bank3 00 even even even even 01 odd oddodd odd 10 odd odd even even 11 odd odd odd even

It will further be noted that the fetch for row k (addrj+128) fetchessyllables S1 ₂ and S2 ₂ of instruction I₂. It will be clear that thesesyllables are contained in the odd bank B0 ₀ but are not output via themultiplexers 24 a . . . 24 d. Nevertheless they are available to beaccessed on the next processor cycle. This means that subsequent fetchescan be aligned, for example the next access can dispatch addresses l andm (m being in the form l+128). In this case, l is even and m is odd. Ascompared with the earlier arrangements of FIGS. 2 and 4, address timingis faster due to the fact that there are half the number of addresses.

Misaligned fetches may only occur at each PC redirection. The term PCredirection used herein refers to a non-linear change in program count(PC) for any reason, such as taken branch, procedure call, exception orinterrupt. That is, the first fetch after each PC redirection willresult in five to eight useful syllables. After the first fetch,subsequent fetches are aligned and will always fetch eight usefulsyllables until another PC redirection occurs, or until the end of thecache line is reached. Fetches described herein are only from one cacheline. Any fetch which crosses a cache line will require an extra cacheaccess to complete.

The syllables which are shown cross-hatched in FIG. 6 at the beginningof row j are discarded herein because they are before the target programcount of the redirection and are therefore not useful at this stage. Itwill be appreciated that for any PC redirection, zero to three syllablesmay have to be discarded resulting in five to eight useful syllables asdiscussed above.

FIG. 7 is a schematic block diagram showing one way, WAY0, with its evenand odd banks diagrammatically feeding eight syllables to a buffer 40 ineach cycle. The buffer 40 receives syllables from the cache and bundlesthem to feed them to one to four lanes of the execution unit, dependingon how they are bundled. However, instead of only supplying foursyllables on each fetch directly to the lanes (as in FIG. 5), it ispossible to supply eight syllables into a buffer on each fetch cycle.For the example of FIG. 6, the first eight syllables are those from rowsj and j+128, and the next eight syllables in cycle 2 are those from rowsl and m (l+128). For this reason the hatching used in FIG. 7 is the sameas that used in FIG. 6 because the same syllables are denoted. Note thatsyllables which are hatched in the same hatching do not denote syllablesof the same instruction necessarily, but syllables which are retrievedin the same fetch. The purpose of the buffer is to receive excesssyllables fetched from the cache to reduce cache accesses and to hidestalls caused by fetches crossing cache lines. In the event of a PCredirection, the buffer is cleared as the contents are no longerrelevant. Syllable fetches are redirected to the target PC required bythe PC redirection which has just occurred.

The buffer 40 has the capacity of three complete bundles (three totwelve syllables). In the case that a fetch crosses a cache lineboundary, then it will take two cycles to complete the fetch as twoseparate cache accesses are made. If there are instructions stored inthe buffer then this extra cycle of latency can be hidden from theexecution pipeline. With the cache memory structure of FIG. 5, five toeight syllables will be received per fetch into the buffer, except inthe case where a fetch reaches the end of a cache line where only thesyllables remaining from the cache line will be fetched—which will beone to eight if the target of a branch is within eight syllables of theend of a cache line or four or eight syllables for a second or laterfetch following a PC redirection. The diagram in FIG. 7 is schematiconly and shows the switching circuitry necessary to supply eightsyllables per cycle to the buffer 40 as a general block 42. Theindividual components of the switching circuitry are shown in moredetail in FIG. 8.

In particular, FIG. 8 illustrates the multiplexer 24 a . . . 24 d whichare designated only for WAY0, but of course which are similarly presentfor WAYS 1 to 3. The control signals for these multiplexers are notshown in FIG. 8, but they will be present in the manner as describedwith reference to FIG. 5. Such multiplexers 24 a . . . 24 d has twooutputs, one supplying an “even” set of multiplexers 14 ₀ . . . 14 ₃,and the other supplying an “odd” set of multiplexers 14 ₄ . . . 14 ₇.The “even” and “odd” multiplexers are controlled by a signal from acache access circuit 10 similar to that shown in FIG. 5. The controlsignals are not shown in FIG. 8. These multiplexers are labelled 14 ₀ to14 ₇. Although not shown in FIG. 8, bundling logic like that illustratedin FIG. 5 can be used with the embodiment of FIG. 8.

The bundling logic is used in the context of VLIW instructions todispatch to the lanes syllables which are associated in a particularbundle for each cycle.

Thus according to the above described embodiment an improved cachestructure is provided which has a number of advantages.

The instruction cache can be accessed less often, reducing the overallpower consumption involved in accesses.

There is an increased likelihood of hiding fetched stalls caused byfetches crossing cache lines. That is, fetching more syllables at oncefrom the cache allows the buffer to fill faster so fetch stalls are morelikely to be hidden (e.g. where a fetch crossing a cache line takes twocycles).

Moreover, there is a possibility to build an eight issue processor corewith the instruction cache. In the above described embodiment of FIG. 5,the even and odd capability is used to enhance latency by using aninstruction buffer 40. However, as shown in FIG. 8 the syllables couldbe fed directly to the lanes in an eight issue CPU core.

Reference will now be made to FIGS. 9 and 10 to illustrate applicationsof the cache memory discussed above. FIG. 9 illustrates a video/audiodecode unit for a set top box. The unit comprises a main memory 50, acache memory 52 in accordance with one of the embodiments describedabove and a processor 54. Video or audio data is streamed into the mainmemory as denoted by the incoming arrow at the top of FIG. 9 andprocessed by the processor 54 through the intervention of the cache 52to generate decoded data indicated by the output arrow at the bottom ofFIG. 9.

FIG. 10 is a schematic block diagram of a handheld personal computerwith a mobile phone capability. The computer comprises a main memory 60,a cache 62 in accordance with one of the embodiments described above anda processor 64. In addition RF circuitry 66 is provided for implementinga wireless connection for transfer of data in a wireless network.

It may be advantageous to set forth definitions of certain words andphrases used in this patent document. The term “couple” and itsderivatives refer to any direct or indirect communication between two ormore elements, whether or not those elements are in physical contactwith one another. The terms “include” and “comprise,” as well asderivatives thereof, mean inclusion without limitation. The term “or” isinclusive, meaning and/or. The phrases “associated with” and “associatedtherewith,” as well as derivatives thereof, may mean to include, beincluded within, interconnect with, contain, be contained within,connect to or with, couple to or with, be communicable with, cooperatewith, interleave, juxtapose, be proximate to, be bound to or with, have,have a property of, or the like.

While this disclosure has described certain embodiments and generallyassociated methods, alterations and permutations of these embodimentsand methods will be apparent to those skilled in the art. Accordingly,the above description of example embodiments does not define orconstrain this disclosure. Other changes, substitutions, and alterationsare also possible without departing from the spirit and scope of thisdisclosure, as defined by the following claims.

1. A cache memory comprising: a first set of storage locationsconfigured to hold instruction syllables and addressable by a firstgroup of addresses; a second set of storage locations configured to holdinstruction syllables and addressable by a second group of addresses;addressing circuitry configured to provide in each addressing cycle apair of addresses comprising one from the first group and one from thesecond group, thereby accessing a plurality of instruction syllablesfrom each set of storage locations; selection circuitry configured toselect from the plurality of instruction syllables to output to aprocessor lane based on whether a required instruction syllable isaddressable by an address in one of the first and second groups; and abuffer configured to hold the instruction syllables selected from theoutputted instruction syllables and to bundle the instruction syllablesin accordance with a VLIW (very long instruction word) format.
 2. Acache memory according to claim 1, wherein the first group of addressesare odd addresses, and the second group of addresses are even addresses.3. A cache memory according to claim 2, wherein the addressing circuitryis further configured to provide a control signal to the selectioncircuitry, the control signal identifying whether a required instructionsyllable is addressable by an address in the first or second group.
 4. Acache memory according to claim 3, wherein the selection circuitrycomprises a plurality of multiplexers, each multiplexer connected toreceive an instruction syllable addressable by the first group ofaddresses and an instruction syllable addressable by the second group ofaddresses, and to output one of the instruction syllables.
 5. A cachememory comprising a plurality of ways, each way comprising: a first setof storage locations configured to hold instruction syllables and acache tag where the instruction syllables are addressable by a firstgroup of addresses; a second set of storage locations configured to holdinstruction syllables and a cache tag in main memory where theinstruction syllables are addressable by a second group of addresses;addressing circuitry configured to provide in each addressing cycle apair of addresses comprising one from the first group and one from thesecond group, thereby accessing a plurality of instruction syllablesfrom each set of storage locations; selection circuitry configured toselect from the plurality of instruction syllables to output based onwhether a required instruction syllable is addressable by an address inthe first or second group; a buffer configured to hold the instructionsyllables selected from the outputted instruction syllables and tobundle the instruction syllables in accordance with a VLIW (very longinstruction word) format; and the cache memory further comprisingswitching circuitry configured to select, from the outputted instructionsyllables, instruction syllables associated with one of the ways basedon comparing at least part of the addresses provided by the addressingcircuitry with the cache tags held in the storage locations.
 6. A cachememory according to claim 5, wherein the first group of addresses areodd addresses, and the second group of addresses are even addresses. 7.A cache memory according to claim 6, wherein the addressing circuitry isfurther configured to provide a control signal to the selectioncircuitry, the control signal identifying whether a required instructionsyllable is addressable by an address in one of the first and secondgroups.
 8. A cache memory according to claim 7, wherein the selectioncircuitry comprises a plurality of multiplexers, each multiplexerconnected to receive an instruction syllable addressable by the firstgroup of addresses and an instruction syllable addressable by the secondgroup of addresses, and to output one of the instruction syllables.
 9. Acache memory according to claim 8, comprising buffer logic.
 10. A cachememory according to claim 9, wherein the addressing circuitry isconfigured to provide only a part of each address in the pair to accessthe instruction syllables.
 11. A processor comprising: a main memoryconfigured to hold instructions wherein each instruction comprises atleast one instruction syllable; a cache memory comprising a plurality ofways, each way comprising: a first set of storage locations configuredto hold instruction syllables and a cache tag where the syllables areaddressable by a first group of addresses, and a second set of storagelocations configured to hold instruction syllables and a cache tag inmain memory where the instruction syllables are addressable by a secondgroup of addresses; addressing circuitry configured to provide in eachaddressing cycle a pair of addresses comprising one from the first groupand one from the second group, thereby accessing a plurality ofinstruction syllables from each set of storage locations; selectioncircuitry configured to select from the plurality of instructionsyllables to output based on whether a required instruction syllable isaddressable by an address in the first or second group; switchingcircuitry configured to select, from the outputted instructionsyllables, instruction syllables associated with one of the ways basedon comparing at least part of the addresses provided by the addressingcircuitry with the cache tags held in the storage locations; and abuffer configured to hold the instruction syllables selected from theoutputted instruction syllables and to bundle the instruction syllablesin accordance with a VLIW (very long instruction word) format; and aplurality of execution lanes, each lane being supplied with aninstruction syllable selected from the outputted syllables.
 12. Aprocessor according to claim 11, wherein instruction syllables which arenot output by the selection circuitry in a first output cycle are heldin a buffer and output in a subsequent output cycle.
 13. A processoraccording to claim 12, wherein the number of execution lanes is four.14. A processor according to claim 11, wherein in the same output cycleinstruction syllables which are output by the selection circuitryaddressable by an address in the first group are supplied to a first setof execution lanes, and instruction syllables output by the selectioncircuitry addressable by an address in the second group are output to asecond set of execution lanes.
 15. A processor according to claim 14,wherein there are four execution lanes in each set.
 16. A processorcomprising: a main memory configured to hold instructions wherein eachinstruction comprises at least one instruction syllable; a cache memorycomprising a plurality of ways, each way comprising: a first set ofstorage locations configured to hold instruction syllables and a cachetag where the instruction syllables are addressable by a first group ofaddresses, and a second set of storage locations configured to holdinstruction syllables and a cache tag in main memory where theinstruction syllables are addressable by a second group of addresses;addressing circuitry configured to provide in each addressing cycle apair of addresses comprising one from the first group and one from thesecond group, thereby accessing a plurality of instruction syllablesfrom each set of storage locations; selection circuitry configured toselect from the plurality of instruction syllables to output based onwhether a required instruction syllable is addressable by an address inthe first or second group; and switching circuitry configured to select,from the outputted instruction syllables, instruction syllablesassociated with one of the ways based on comparing at least part of theaddresses provided by the addressing circuitry with the cache tags heldin the storage locations; a plurality of execution lanes, each lanebeing supplied with a instruction syllable selected from the outputtedinstruction syllables; and bundling logic connected to the plurality ofexecution lanes and adapted to receive the instruction syllables and toassemble a bundle in accordance with a very long instruction word (VLIW)format.
 17. A processor comprising: a main memory configured to holdinstructions wherein each instruction comprises at least one instructionsyllable; a cache memory comprising a plurality of ways, each waycomprising: a first set of storage locations configured to holdinstruction syllables and a cache tag where the instruction syllablesare addressable by a first group of addresses, and a second set ofstorage locations configured to hold instruction syllables and a cachetag in main memory where the instruction syllables are addressable by asecond group of addresses; addressing circuitry configured to provide ineach addressing cycle a pair of addresses comprising one from the firstgroup and one from the second group, thereby accessing a plurality ofinstruction syllables from each set of storage locations; selectioncircuitry configured to select from the plurality of instructionsyllables to output based on whether a required instruction syllable isaddressable by an address in the first or second group; switchingcircuitry configured to select, from the outputted instructionsyllables, instruction syllables associated with one of the ways basedon comparing at least part of the addresses provided by the addressingcircuitry with the cache tags held in the storage locations; and abuffer configured to hold the instruction syllables selected from theoutputted instruction syllables and to bundle the instruction syllablesin accordance with a VLIW (very long instruction word) format; aplurality of execution lanes, each lane being supplied with ainstruction syllable selected from the outputted instruction syllables;and a bundling logic connected to the plurality of execution lanes andadapted to receive the outputted instruction syllables and to assemble abundle in accordance with a very long instruction word (VLIW) format.18. A processor according to claim 17, wherein in the same output cycleinstruction syllables which are output by the selection circuitryaddressable by an address in the first group are supplied to a first setof execution lanes, and instruction syllables output by the selectioncircuitry addressable by an address in the second group are output to asecond set of execution lanes.
 19. A handheld personal computercomprising circuitry configured to establish a wireless connection and aprocessor, wherein the processor comprises: a main memory configured tohold instructions wherein each instruction comprises at least oneinstruction syllable; a cache memory comprising a plurality of ways,each way comprising: a first set of storage locations configured to holdinstruction syllables and a cache tag where the instruction syllablesare addressable by a first group of addresses, and a second set ofstorage locations configured to hold instruction syllables and a cachetag in main memory where the instruction syllables are addressable by asecond group of addresses; addressing circuitry configured to provide ineach addressing cycle a pair of addresses comprising one from the firstgroup and one from the second group, thereby accessing a plurality ofinstruction syllables from each set of storage locations; selectioncircuitry configured to select from the plurality of instructionsyllables to output based on whether a required instruction syllable isaddressable by an address in the first or second group; switchingcircuitry configured to select from the outputted instruction syllables,instruction syllables associated with one of the ways, based oncomparing at least part of the addresses provided by the addressingcircuitry with the cache tags held in the storage locations; a pluralityof execution lanes, each lane being supplied with a instruction syllableselected from the outputted instruction syllables; and a bufferconfigured to hold the instruction syllables selected from the outputtedinstruction syllables and to bundle the instruction syllables inaccordance with a VLIW (very long instruction word) format.
 20. Avideo/audio decode unit for a set top box comprising a processor,wherein the processor comprises: a main memory configured to holdinstructions wherein each instruction comprises at least one instructionsyllable; a cache memory comprising a plurality of ways, each waycomprising: a first set of storage locations configured to holdinstruction syllables and a cache tag where the instruction syllablesare addressable by a first group of addresses, and a second set ofstorage locations configured to hold syllables and a cache tag in mainmemory where the instruction syllables are addressable by a second groupof addresses; addressing circuitry configured to provide in eachaddressing cycle a pair of addresses comprising one from the first groupand one from the second group, thereby accessing a plurality ofinstruction syllables from each set of storage locations; selectioncircuitry configured to select from said plurality of instructionsyllables to output based on whether a required instruction syllable isaddressable by an address in the first or second group; and switchingcircuitry configured to select from outputted instruction syllables,syllables associated with one of said ways, based on comparing at leastpart of the addresses provided by the addressing circuitry with thecache tags held in the storage locations; a plurality of executionlanes, each lane being supplied with an instruction syllable selectedfrom said outputted instruction syllables; and a buffer configured tohold the instruction syllables selected from said outputted instructionsyllables and to bundle the instruction syllables in accordance with aVLIW (very long instruction word) format.
 21. A method of operating acache memory comprising a first set of storage locations (B0e) forholding instruction syllables and addressable by a first group ofaddresses, the method comprising: providing in each addressing cycle apair of addresses comprising one from the first group and one from asecond group, thereby accessing a plurality of instruction syllablesfrom each set of storage locations; selecting from the plurality ofinstruction syllables to output to a processor lane based on whether arequired instruction syllable is addressable by an address in the firstor second group; and buffering the outputted instruction syllables; andbundling the instruction syllables in accordance with a VLIW (very longinstruction word) format.
 22. The method according to claim 21, whereinthe first group of addresses are odd addresses and the second group ofaddresses are even addresses.